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APPROACHES FOR MULTI-STEP DENSITY FORECASTS WITH 
APPLICATION TO AGGREGATED WIND POWER 

By Ada Lau^ and Patrick McSharry^ 

University of Oxford 

The generation of multi-step density forecasts for non-Gaussian 
data mostly relies on Monte Carlo simulations which are computa- 
tionally intensive. Using aggregated wind power in Ireland, we study 
two approaches of multi-step density forecasts which can be obtained 
from simple iterations so that intensive computations are avoided. 
In the first approach, we apply a logistic transformation to normal- 
ize the data approximately and describe the transformed data using 
ARIMA-GARCH models so that multi-step forecasts can be iterated 
easily. In the second approach, we describe the forecast densities by 
truncated normal distributions which are governed by two parame- 
ters, namely, the conditional mean and conditional variance. We ap- 
ply exponential smoothing methods to forecast the two parameters 
simultaneously. Since the underlying model of exponential smooth- 
ing is Gaussian, we are able to obtain multi-step forecasts of the 
parameters by simple iterations and thus generate forecast densities 
as truncated normal distributions. We generate forecasts for wind 
power from 15 minutes to 24 hours ahead. Results show that the 
first approach generates superior forecasts and slightly outperforms 
the second approach under various proper scores. Nevertheless, the 
second approach is computationally more efficient and gives more ro- 
bust results under difi'erent lengths of training data. It also provides 
an attractive alternative approach since one is allowed to choose a 
particular parametric density for the forecasts, and is valuable when 
there are no obvious transformations to normalize the data. 

1. Introduction. Wind power forecasts are essential for the efficient op- 
eration and integration of wind power into the national grid. Since wind is 
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variable and wind energy cannot be stored efficiently, there are risks of power 
shortages during periods of low wind speed. Wind turbines may also need to 
be shut down when wind speeds are too high, leading to an abrupt drop of 
power supply. It is extremely important for power system operators to quan- 
tify the uncertainties of wind power generation in order to plan for system 
reserve efficiently [Doherty and O'Malley (2005)]. In addition, wind farm 
operators require accurate estimations of the uncertainties of wind power 
generation to reduce penalties and maximize revenues from the electricity 
market [Pinson, Chevallier and Kariniotakis (2007)]. 

Since the work of Brown, Katz and Murphy (1984) in wind speed fore- 
casting using autoregressive models, there has been an increasing amount of 
research in wind speed and wind power forecasts. Most of the early litera- 
ture focuses on point forecasts, and in recent years more emphasis has been 
placed on probabilistic or density forecasts because of the need to quan- 
tify uncertainties. However, the number of studies on multi-step density 
forecasts is still relatively small, not to mention the evaluation of forecast 
performances for horizons h> 1. Early works on multi-step density forecasts 
can be found in Davies, Pemberton and Petruccelli (1988) and Moeanaddin 
and Tong (1990), where the densities are estimated using recursive numer- 
ical quadrature that requires significant computational time. Manzan and 
Zerom (2008) propose a nonparametric way to generate density forecasts for 
the U.S. Industrial Production series, which is based on bootstrap methods. 
However, Monte Carlo simulations are required and this approach is also 
computationally intensive. 

One of the approaches to wind power forecasting is to focus on the mod- 
eling of wind speed and then transform the data into wind power through a 
power curve [Sanchez (2006)]. An advantage is that wind speed time series 
are smoother and more easily described by linear models. However, a major 
difficulty is that the shape of the power curve may vary with time, and also 
it is difficult to quantify the uncertainties in calibrating the nonlinear power 
curve. Another approach is to transform meteorological forecasts into wind 
power forecasts, where ensemble forecasts are generated from sophisticated 
numerical weather prediction (NWP) models [Taylor, McSharry and Buizza 
(2009), Pinson and Madsen (2009)]. This approach is able to produce reliable 
wind power forecasts up to 10 days ahead, but it requires the computation 
of a large number of scenarios as well as expensive NWP models. A third 
approach to wind power forecasting focuses on the direct statistical model- 
ing of wind power time series. In this case the difficulty lies on the fact that 
wind power time series are highly nonlinear and non-Gaussian. In particular, 
wind power time series at individual wind farms always contain long chains 
of zeros and sudden jumps from maximum capacity to a low value due to 
gusts of wind since turbines have to be shut down temporarily. Neverthe- 
less, it has been shown that statistical time series models may outperform 
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sophisticated meteorological forecasts for short forecast horizons within 6 
hours [Milligan, Schwartz and Wan (2004)]. Extensive reviews of the short 
term state-of-the-art wind power prediction are contained in Landberg et al. 
(2003), Giebel, Kariniotakis and Brownsword (2003) and Costa et al. (2008), 
in which power curve models, NWP models and other statistical models are 
discussed. 

In this paper we adopt the third approach and consider modeling the wind 
power data directly. We aim at short forecast horizons within 24 hours ahead, 
since for longer forecast horizons the NWP models may be more reliable. As 
mentioned above, wind power time series are highly nonlinear. Aggregating 
the individual wind power time series will smooth out the irregularities, 
resulting in a time series which is more appropriately described by linear 
models under suitable transformations. Aggregated wind power generation 
is also more relevant to power companies since they mainly consider the total 
level of wind power generation available for dispatch. Thus, it is economically 
important to generate reliable density forecasts for aggregated wind power 
generation. 

For this reason, as a first study, this paper considers the modeling of 
aggregated wind power time series. One may argue that utilizing spatiotem- 
poral correlations among individual wind farms may improve the results in 
forecasting aggregated wind power. We will show in Section 4 that this is 
not the case here, at least by the use of a simple multiple time series ap- 
proach. Unless one is interested in the power generated at individual wind 
farms, it is more appropriate to forecast the aggregated wind power as a 
univariate time series. We propose two approaches of generating multi-step 
ahead density forecasts for wind power generation, and we demonstrate the 
value of our approaches using wind power generation from 64 wind farms in 
Ireland. In the first approach, we demonstrate that the logistic function is 
a suitable transformation to normalize the aggregated wind power data. In 
the second approach, we describe the forecast densities by truncated normal 
distributions which are governed by two parameters, namely, the conditional 
mean and conditional variance. We apply exponential smoothing methods to 
forecast the two parameters simultaneously. Since the underlying model of 
exponential smoothing is Gaussian, we are able to obtain multi-step forecasts 
of the parameters by simple iterations and thus generate forecast densities 
as truncated normal distributions. Although the second approach performs 
similarly to the first in terms of our evaluation of the wind power forecasts, 
it has numerous advantages. It is computationally more efficient, its fore- 
cast performances are more robust, and it provides the flexibility to choose 
a suitable parametric function for the density forecasts. It is also valuable 
when there are no obvious transformations to normalize the data. 

Our paper is organized as follows. In Section 2 we describe the wind 
power data that we use in our study. Then we explain the two approaches 
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Fig. 1. The locations of 64 wmd farms in Ireland. There are 68 wind farms and wind 
power time series in the raw data, hut 4 pairs of wind farms are so close that they are 
essentially extensions from the corresponding old wind farm. As a result, we simply con- 
sider 64 wind farms here. The wind farms are distributed throughout Ireland, and Arklow 
Banks is the only offshore wind farm. 

of generating multi-step density forecasts in Section 3. The first approach 
concerning the logistic transformation is described in Section 3.1, while in 
Section 3.2 we give the details on the second approach using exponential 
smoothing methods and truncated normal distributions. In Section 4 we 
construct 4 benchmarks to gauge the performances of our approaches, and 
we evaluate the forecast performances using various proper scores. Finally, 
we conclude our paper in Section 5, where we summarize the benefits of our 
approaches and discuss important future research directions. 

2. Wind power data. We consider aggregated wind power generated 
from 64 wind farms in Ireland for approximately six months from 13-Jul- 
2007 to Ol-Jan-2008. The data are recorded every 15 minutes, giving a total 
number of 16,512 observations during the period. The locations of the wind 
farms are shown in Figure 1. One of the wind farms, known as Arklow Banks, 
is offshore.^ We sum up the capacities^ of all wind farms and the total ca- 
pacity is 792.355 MW. In order to facilitate comparisons between data sets 



■^Detailed information of individual wind farms, such as latitude, longitude and capacity, 
is provided by Eirgrid pic and can be found in Lau (2010). 

*Tlie capacity is the maximum output of a wind farm when all turbines operate at 
their maximum nominal power. 
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Fig. 2. Time series of normalized aggregated wind power from 64 wind farms in Ireland, 
where the aggregated wind power is normalized by the total capacity of 792.355 MW. The 
data are dissected into a training set and a testing set as shown by the dashed line. About 
four months of data are used for parameter estimation, and the remaining two months of 
data are used for out-of-sample evaluation. 

with different capacities, we normalize the aggregated wind power by divid- 
ing by the total capacity, that is, 792.355 MW, and so the normalized data 
is bounded within [0, 1]. We have checked that forecast results, in particular, 
for approaches involving nonlinear transformations, are in fact insensitive to 
the exact value of normalization.^ We dissect the data into a training set 
of about 4 months (the first 11,008 data points) for parameter estimation, 
and a testing set of about two months (the remaining 5504 data points) for 
out-of-sample forecast evaluations. Figures 2 and 3 show the original and 
the first differences of the normalized aggregated wind power respectively. 
It is clear that wind power data are nonstationary. The variance is chang- 
ing with time, showing clusters of high and low variability. Also, there are 
some occasional spikes. Figures 4 and 5 show the autocorrelation function 
of the wind power and its first differences respectively. Autocorrelation is 
significantly reduced by taking first differences. 

Since our aim is to generate short term forecasts up to 24 hours ahead, 
we do not focus on modeling any long term seasonality, which often appears 
in wind data due to the changing wind patterns throughout the year. For 
example, we can model a cycle of 90 days by regressing the data in the 



^In our paper the value of normalization must not be smaller than the total capacity 
since we will consider the logistic transformation (1). 
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Fig. 3. First differences of normalized aggregated wind power. It is clear that the variance 
changes with time, and there is volatility clustering as well as sudden spikes. The data are 
dissected by the dashed line into a training set and a testing set. 
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Fig. 4. Sample ACF of the time series of normalized aggregated wind power up to a lag 
of 7 days. The autocorrelations decay very slowly. It shows that the wind power data are 
highly correlated and may incorporate long memory effects. 
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Fig. 5. Sample ACF of the first differences of normalized aggregated wind power up to 
a lag of 7 days. The dashed lines are the confidence hounds at 2 standard deviations, 
assuming that the data follow a Gaussian white noise process. The autocorrelations are 
significantly reduced, hut they are still significant up to a lag of 7 days. 

training set with 16 harmonics of sines and cosines with periods T = j/(90 x 
96), j = 1, . . . , 16. This gives a fitted time series as shown in Figure 6 with 
= 0.395. One may then model the deseasonalized data, but studies show 
that results may be worse than those obtained by modehng the seasonahty 
directly [Jorgenson (1967)]. On the other hand, we are more interested in 
the diurnal cycle since it plays a more important role in intraday forecasts. 
Diurnal cycles may appear in wind data due to different temperatures and 
air pressures during the day and the night, and wind speeds are sometimes 
larger during the day when convection currents are driven by the heating 
of the sun. Thus, we try to fit the training data with harmonics of higher 
frequencies, such as those with T = j/96 where j is an integer. However, 
results show that those harmonics cannot help us to explain the variances in 
the data, and, thus, we decide to exclude the modeling of any diurnal cycle 
in this paper. 

Aggregated wind power time series, although smoother than those from 
individual wind farms, are non-Gaussian. In particular, they are nonnega- 
tive. Figure 7 shows the unconditional density of aggregated wind power. 
This distribution has a sharper peak than the normal distribution and is 
also significantly right-skewed. Common transformations for normalizing 
wind speed data include the logarithmic transformation and the square root 
transformation [Taylor, McSharry and Buizza (2009)]. However, those trans- 
formations are shown to be unsatisfactory for our particular wind power data 
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Fig. 6. Long term seasonality appears m the wind data. We regress the data in the train- 
ing set with 16 harmonics of sines and cosines with periods T — j/ (90 x 96), j = 1,. . . , 16, 
so that the maximum period is 90 days. The fit gives an E? — 0.395. The thin dashed line 
is the observed normalized wind power and the solid line is the fitted time series with a 
cycle of 90 days. The vertical dashed line dissects the data into a training set and a testing 
set. 
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Fig. 7. Unconditional empirical density of the normalized aggregated wind power, fitted 
using the data in the training set. The density is clearly non-Gaussian since the data is 
bounded. The density is skewed and has a sharper peak than the Gaussian distribution. 
This density gives the climatology forecast benchmark. 
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Fig. 8. Density of the wind power data after applying the logarithmic transformation, 
which remains non-Gaussian. The logarithmic transformation is a common transforma- 
tion to convert wind speed data into an approximate Gaussian distribution, but is clearly 
unappropriate for wind power data. The solid line is the fitted Gaussian distribution by 
maximizing the likelihood. 

as demonstrated in Figures 8 and 9. Nevertheless, we could transform the 
wind power data yt by a logistic transformation. This can be traced back 
to the work of Johnson (1949), and recently Bj0rnar Bremnes (2006) ap- 
plies this transformation to model wind power. The logistic transformation 
is given by 

(1) zt = log(^j^^, o<yt<i, 

and the transformed data zt gives a distribution which can be well approx- 
imated by a Gaussian distribution as shown in Figure 10. In contrast with 
individual wind power data, we do not encounter any values of zero or one 
and so (1) is well defined. In Section 3.1 we apply this transformation and 
build a Gaussian model to generate multi-step density forecasts for wind 
power. 

3. Approaches for density forecasting. Since our aim of this paper is to 
generate multi-step ahead density forecasts without relying on Monte Carlo 
simulations, it is important that our approach can be iterated easily. For this 
reason, in both of the following approaches, we consider a Gaussian model 
at certain stages so that we can iterate the forecasts in a tractable manner. 
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Square root transformed data 

Fig. 9. Density of the wind power data after applying the square root transformation, 
which remains non-Gaussian. The square root transformation is a common transforma- 
tion to convert wind speed data into an approximate Gaussian distribution, but is clearly 
inappropriate for wind power data. The solid line is the fitted Gaussian distribution by 
maximizing the likelihood. 
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Fig. 10. Density of the wind power data after applying the logistic transformation, which 
can be well approximated by a Gaussian distribution. The solid line is the fitted Gaussian 
distribution by maximizing the likelihood. 



MULTI-STEP WIND POWER DENSITY FORECASTS 



11 



1.5 



1 - 




1/8/07 1/9/07 1/10/07 1/11/07 1/12/07 1/1/08 
Time 

Fig. 11. First differences of the logistic transformed wind power. The variance is not 
changing as fast as before, and the amount of volatility clusterings is reduced. However, 
the time series is still nonstationary. The data are dissected by the dashed line into a 
training set and a testing set. 

3.1. Gaussian model for transformed data. In the first approach, we con- 
sider the transformation of wind power data into an approximately Gaus- 
sian distribution so that we could describe the transformed data by a sim- 
ple Gaussian model, in particular, the conventional ARIMA-GARCH model 
with Gaussian innovations. As discussed in Section 2, we transform the wind 
power data by the logistic function in (1). This transformation maps the sup- 
port from (0, 1) to the entire real axis, and Figure 10 shows that this results 
in an approximately Gaussian distribution. 

As wind power data are nonstationary, so are the transformed data and 
we consider the first differences wt = zt — zt-i- When compared with the 
original first differences yt — yt-i in Figure 3, the logistic transformed values 
Zt have fewer volatility clusterings and a smaller autocorrelation. This is 
shown in Figure 11 and Figure 12, respectively. Thus, we model zt by an 
ARIMA(p,l,g)-GARCH(r,s) model*' 

PI 

Wt = fJ. + '^cpiWt-i + '^Ojet-.j +et, £t\J^t~-i '~ ' iV(0, 0-^.^), 
1=1 1=1 

(2) 



^We have also considered modeling zt by ARMA(p, g)-GARCH(r, s) models, but they 
are not selected based on the BIG values. 
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Fig. 12. Sample ACF of the first differences of logistic transformed wind power up to 
a lag of 7 days. The dashed lines are the confidence bounds at 2 standard deviations, 
assuming that the data follow a Gaussian white noise process. The autocorrelations are 
slightly smaller than that for the original data, which is shown in Figure 5. 

r s 

= ^ + X] + ^j'^h-v 

i=i j=i 

where wt = zt — ^t-i, fi,4>i,9j,uj,ai, f3j are constant coefficients satisfying 
the usual conditions [Tsay (2005)] and J^t consists of all the past values of 
z up to time t. We also consider an ARIMA(p, model for zt with con- 
stant conditional variance Var[ef|/"f_i] = c^.j = a'^, so as to compare with 
the ARIMA(p, 1, (7)-GARCH(r, s) model. We select the models by minimiz- 
ing the Bayesian Information Criteria (BIC). Parameters are estimated by 
maximizing the Gaussian likelihood. 

The optimal /i-step ahead forecasts and can be easily ob- 

tained, and the corresponding /i-step ahead density forecast of is given 
by the Gaussian distribution, that is, fz^^^t N{zt+h\tT^t+h\t^ ^° 
^t+h\t ~ '^^'^[^t+h\J^t] can be obtained from {o^'^.^_^_j^^}j=l in a standard way, 
for example, by expressing the model in a moving average (MA) represen- 
tation [Tsay (2005)]. To restore the density of the normalized aggregated 
wind power 1^+^, we compute the Jacobian of the transformation in (1) 
where |J| = \dz/dy\ = l/[y(l — y)]- The density of is then given by 
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fYt+^M+h) = \J\fz,+^M+h), that is, 

1 1 



fvt+Mtiyt+h) 



(3) 

xe^p\(-(log(-y^)-z,^,u) )/{2al 



Note that (3) is the /i-step ahead conditional density of l^+h given the 
conditional point forecast of Zf^h\t time t. 

3.2. Exponential smoothing and truncated normal distribution. The sec- 
ond approach deals with the original wind power data yt directly. How- 
ever, since the data are non-Gaussian, there is a problem with the iter- 
ation of multi-step ahead density forecasts. We handle this by expressing 
the /i-step ahead conditional density as a function of its first two moments. 
For instance, the one-step ahead density is written as ft+i\t{y'i 
where fit+i\t = '^[yt+i\^t] is the conditional mean and ^t_^i\f = Var[y(+i|J^(] = 
Var[et_|_i|J^t] = is the conditional variance.^ At this moment, we do 

not attempt to figure out the exact form of the density function ft+i\t- Given 
any ft+m a model M for the dynamics, we can always evolve the density 
function so that 

ft+i\t{y] f''t+i\t,^t+i\t) ^ ft+hitiy] fi't+hit, ^t+h\t) ^ 
(4) fit+h\t = pfi (At+i|i , • • • , At+/i-i|i; yi,---,yt), 

^t+h\t ~ 'iM \^e\t+l\V ■ ■ ■ ''^e;t+h\t)' 

where denotes the process of evolving the dynamics and generating h- 
step ahead density forecasts under the unknown model M, which in practice 
may require the use of Monte Carlo simulations. Here p^^^ and q~^^ stand 
for functions that give the conditional mean and the conditional variance of 
yt, with parameters that depend on the model M and the forecast horizon 
h. 

It is difficult to obtain any closed form for ft+h\t if the distribution of 
innovations £t is non-Gaussian. Thus, we propose to use a two-step approach 
to approximate ft+h\t- the first step, we attempt to model the dynamics 



^In this paper, (5"^^^|j denotes the conditional variance of the data yt+h, while o-^.f^^t 
denotes the conditional variance of the innovation £t+h, so that in general a"f+ji|( is a 
function of (T^.t^jn with j = 1, . . . , h. 
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of the conditional mean it+h\t ^"^^ the conditional variance s^_^_m of the data 
using a Gaussian model G. This is expressed as 

(5) 



Step 1: it+h\t=PG\^t+i\u---,('t+h-i\t:yu---,yt) 



^t+h\t — ^G \^ e]t+l\V ■ ■ ■ ^ ^ e\t+h 



2 



where ^ and stand for functions that give the conditional mean and 
the conditional variance of yt+h-, with parameters that depend on the Gaus- 
sian model G and horizon h. In model G, the innovations are additive and 
are assumed to be i.i.d. Gaussian distributed. For example, G can be the 
conventional ARIMA-GARCH model with Gaussian innovations. This may 
be violated in reality, so lt+h\t and s^j^y^^i obtained from model G may not be 
the true conditional mean fit+h\t a-^id conditional variance d''^_^f^^^ respectively. 
They only serve as proxies to the true values. 

Although model G may not describe real situations, we rely on a second 
step for remedial adjustments such that the final density forecast is an ap- 
proximation to reality. In the second step, we assume that the /i-step ahead 
density ft+h\t can be approximated by a parametric function D, which is 
characterized by a location parameter and a scale parameter. In particular, 
the location parameter and the scale parameter are obtained from the con- 
ditional mean it+h\t a-iid the conditional variance respectively, which 
are estimated from the Gaussian model G. Thus, we simply take 

(6) Step 2: ft+h\t{y; k+h\t, ^t+h\t) - D{y; it+h\t, St+h\t) 

as the /i-step ahead density forecast where D is a function depending on two 
parameters only. As a result, the two-step approach may be able to give a 
good estimation of ft+h\t if (6) is a close approximation. In (6) the correct 
conditional mean fit+h\t ^^^d conditional variance (^"^^^ are generated by 

Pm{') and q^j^j {•) under the true model M, while the corresponding proxy 
values lt+h\t and are generated by p^q\-) and qQ\-) under a Gaussian 

model G. Empirical studies will be needed to determine the appropriate 
Gaussian model G as well as the best choice D in order to approximate the 
final density ft+h\t- 

For our normalized aggregated wind power y^, choosing D as the trun- 
cated normal distribution bounded within [0, 1] gives a good approximation 
of ft+h\t- Truncated normal distributions have been applied successfully in 
modeling bounded, nonnegative data [Sanso and Guenni (1999), Gneiting et 
al. (2006)]. We consider D to be parameterized by the location parameter 
^t+h\t and the scale parameter s^j^^, where iV(^t4.h|f, s^_,_;j|^) is the corre- 
sponding normal distribution without truncation. Note that (!-t+h\t and s'^^^ 
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will be the true conditional mean and conditional variance if the data are 
indeed Gaussian. The density function ft+h\t is then given by (6) so that 

1 ( (y- ^t+h\t 



ft+h\t{y;f^t+h\t,^t+h\t) = s (f 

St+h\t \ 

(7) 



St+h\t 



^t+h\t / V ^t+h\t 

for y G (0, 1), where ip and $ are the standard normal density and distribu- 
tion function respectively. 

Instead of directly estimating £t+h\t and s^_^^|j using the ARIMA-GARCH 
models, we find that a better way is to smooth the two parameters simultane- 
ously by exponential smoothing methods. Exponential smoothing methods 
have been widely and successfully adopted in areas such as inventory fore- 
casting [Brown and Meyer (1961)], electricity forecasting [Taylor (2003)] and 
volatility forecasting [Taylor (2004)]. A comprehensive review of exponen- 
tial smoothing is given by Gardner (2006). Hyndman et al. (2008) provide a 
state space framework for exponential smoothing, which further strengthens 
its value as a statistical model instead of an ad hoc forecasting procedure. 
Ledolter and Box (1978) show that exponential smoothing methods pro- 
duce optimal point forecasts if and only if the underlying data generating 
process is within a subclass of ARiMA{p, d,q) processes. We extend this 
property and demonstrate that simultaneous exponential smoothing on the 
mean and variance can produce optimal point forecasts if the data follow 
a corresponding ARIMA(p, d, q')-GARCH(r, s) process. This enables us to 
generate multi-step ahead forecasts for the parameters it+h\t a-iid s^^^^ by 
iterating the underlying ARIMA-GARCH model of exponential smoothing. 



3.2.1. Smoothing the location parameter only. For the simplest case, let 
us assume that the conditional variance of wind power is constant. This 
means that we only need to smooth the conditional mean it, while the con- 
ditional variance will be estimated directly from the data via estimating 
the variance of innovations s^. From now on, we refer to the conditional 
mean as the location parameter and the conditional variance as the scale 
parameter so as to remind us that they correspond to the truncated normal 
distribution. Again, the /i-step ahead scale parameter is obtained as 

a function of s^. 

By simple exponential smoothing, the smoothed series of the location 
parameter It is given by St, which is updated according to 



(8) 



St = ayt + (1 - a)St-i, 
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where yt is the observed wind power at time t and < a < 1 is a smoothing 
parameter. We initiahze the series by setting Si = yi, and the one-step ahead 
forecast is 4+i|t = St- Iterating (8) gives it+h\t = ^t- However, the forecast 

errors yt — it\t-i 8.re highly correlated, with a significant lag one sample 
autocorrelation of 0.2723. A simple way to improve the forecast is to add a 
parameter (f>s to account for autocorrelations in the forecast equation [Taylor 
(2003)]. We call this the simple exponential smoothing with error correction. 
The updating equation is still given by (8), but the forecast equation is 
modified as 

(9) it+i\t = St + 4>s{yt- St^i), 

where \(j)s\ < 1. Note that it is now possible to obtain negative values for 
it+i\t ill (9) a-iid in such cases is obviously not the true conditional 

mean. Nevertheless, this is not a problem here since it+i\t essentially serves 
as the location parameter of the truncated normal distribution, which can 
be negative. Following the taxonomy introduced by Hyndman et al. (2008), 
we denote (8) and (9) as the ETS(^, A^, iV|£^C) method, where ETS stands 
for both an abbreviation for exponential smoothing as well as an acronym 
for error, trend and seasonality respectively. The A inside the bracket stands 
for additive errors in the model, the first N stands for no trend, the second 
N stands for no seasonality and EC stands for error correction. 

By directly iterating (8) and (9) and expressing yt+h\t = h+h\ti '^^ have 

(10) it+^,\t = St + ""^'i^ ~f~'\ yt - St^i) + '^'(y* - •^^-i) 

for h> 1. To generate /i-step ahead forecasts of ■s^_^/j|j, it is important that 
we identify an underlying model corresponding to our updating and forecast 
equations (8) and (9). It can be easily checked that the FjTS^A, N , N\EC) 
method is optimal for the ARIMA(1, 1,1) model, in the sense that the fore- 
casts in (9) are the minimum mean square error (MMSE) forecasts. Ex- 
pressed in the form of an ARIMA(1, 1, 1) model with Gaussian innovations, 
the ETS{A,N,N\EC) method can be written as 

(11) wt = cl)sWt-i+et + {a-l)et-i, ''^ ' A(0, s^), 

where wt = yt — yt-i, £t is the Gaussian innovation with mean zero and 
constant variance s^, and a,4>s are the smoothing parameters in (8) and 
(9). It can also be easily verified that (10) is identical to the h-step ahead 
forecasts obtained from the ARIMA(1,1,1) model in (11). It then follows 
from the ARIMA(1, 1,1) model that the h-step ahead forecast variance is 
given by 

h 

(12) sl^, = s',Y.^l_^^ 
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where JIq = ^,^h = 0s +"^(1 ~ "^s )/(! ~ 4>s) for h>l, and is the estimated 
constant variance of the innovations. Note that in this case, (12) is the 

exphcit form of s'^^^^^ = (/^^(s^) (5)- 

Since maximum hkehhood estimators are well known to have nice asymp- 
totic properties, we estimate the three parameters a, cps and by maximiz- 
ing the likelihood of the truncated normal distribution ft+i\tiyt+i', ^t+i\tJ ^t+i\d 
One may also consider minimizing the mean continuous ranked probability 
scores (CRPS) of the density forecasts [Gneiting et al. (2005, 2006)], but this 
requires a much larger amount of computation. Although it may slightly im- 
prove the density forecasts, minimizing the CRPS is not appealing here since 
we aim at generating multi-step forecasts in a computationally efficient way. 
After obtaining the parameters, from (10) and (12) we can generate the 
h-step ahead density forecasts using (7). 

3.2.2. Smoothing both the location and scale parameters simultaneously. 
Next, we consider heteroscedasticity for the conditional variances of wind 
power. In this case, apart from smoothing the location parameter it, we 
also simultaneously smooth the scale parameter s^. In fact, we smooth the 
variance of innovations s^.^ and obtain the scale parameter sf as a function 
of s^.j as in (5). 

Equipped with the one-step ahead forecast of the location parameter 
it\t-ii we may calculate the squared difference between it\t-i and the ob- 
served wind power yt, that is, {yt — 4|t-i) i as the estimated variance s^.j at 
time t. Applying simple exponential smoothing, the smoothed series of s^.^ 
is given by Vt, which is updated according to 

(13) Vt = 7{yt - itit-if + (1 - l)Vt-u 

where yt is the observed wind power at time t, it\t-i is obtained by (9) and 
< 7 < 1 is a smoothing parameter. We initialize the series by setting Vi to 
be the variance of the data in the training set. In fact, the forecasts are not 
sensitive to the choice of initial values due to the size of the data set. The 
one-step ahead forecast is given by s^.^^^jj = Vt. Again, the forecast errors 
are highly correlated and it is better to include an additional parameter 
(py in the forecast equation to account for autocorrelations. The modified 
forecast equation is then given by 

(14) slt+i\t = yt + Mivt - it\t-if - Vt^i], 

where {(p^l < 1. Unfortunately, a major drawback of introducing this extra 
term in the forecast equation is that negative values of s'^.t_^_i\t ™ay oc- 
cur. Although this does not happen in our data, we modify our approach 
and consider smoothing the logarithmic transformed scale parameter log s^.^ 
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such that negative values are allowed since we aim at developing a general 

methodology that applies to different data sets. The smoothed series for 

logSg.j is then given by log Vf. Denoting et = yt- t\t-i and = et/VVi, the 
estimated logarithmic variance at time t is now chosen to be g{et) instead 

of log so that 

(15) g{et) = e{\et\-E[\et\]), 

where is a constant parameter. This ensures that g{et) is positive for large 
values of et and negative if et is small. The updating equation and the 
forecast equation are now written respectively as 

log 14 = 75(64) + (1 - 7) log Vt^i, 

(16) 

logs^.t+i|i = logVt + Maiet) - logVt-i], 

which are analogous to (13) and (14), except that a logarithmic transforma- 
tion is taken and {yt — it\t-i)'^ is replaced by g{et). We initialize the series 
by setting logVi = 0. In fact, the smoothing procedure is insensitive to the 
initial value due to the size of the data set. 

Now, the /i-step ahead forecasts of (it+h\t are still obtained from (10), but 
to generate h-step ahead forecasts of s'^_^f^^^ we need to identify an underlying 
model for this smoothing method. We summarize our exponential smoothing 
method for both £t and sf by combining (8), (9) and (16): 

St = ayt + (1 - a)St-i, 

it+i\t = St + (t)s{yt - St^i), 

(17) 

log = 75f(et ) + (1 - 7) log Vt^i, 
\ogsl.^_^^\^ = \ogVt + <t>v[g{et) - logVt-i], 

where g{et) is given in (15) and et as defined previously. There are four 
smoothing parameters a, 7, and a parameter 6 for the estimated loga- 

rithmic variance g{et). We adopt the taxonomy similar to that for exponen- 
tial smoothing for the location parameter as described in Section 3.2.1, and 
denote (17) as the W£S{A,N,N\EC)-{A,N,N\EC) method where the sec- 
ond bracket of {A,N ,N\EC) indicates the exponential smoothing method 
applied for smoothing the variance. We aim at identifying (17) with an 
ARIMA-GARCH model. Using (11) as the ARIMA(1,1,1) model for yt 
and writing £t =yt — ^t\t-ii last equation in (17) can be written as 

log ^l;t+i\t = log Vt + (t>v[9{et) - log Vt^i] 

= ig{(it) + {I - -i)\ogVt-i + (l)v[g{et) -\ogVt-i\ 
(18) = {-f + <P^)g{et) - (pvlogVt-i 
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+ (1 - 7){log4t - 4>vWt-l) - logFt„2]} 

= {l + (t^v)g{et) - 4>v9{et-i) + (1 -7)log4t. 

where we have used the updating equation in (16). This is the exponential 
GARCH, that is, EGARCH(2, 1) model for the conditional variance of in- 
novations £t [Nelson (1991)]. Unlike the conventional EGARCH models for 
asset prices, g{et) is symmetric since there is no reason to expect volatility 
to increase when wind power generation drops. In summary, the exponential 
smoothing method in (17) is optimal for the ARIMA(1, 1, 1)-EGARCH(2, 1) 
model, which can be written as 

wt = (t)sWt-i +et + {a- l)et_i, et\Ft^i ''^^ iV(0,s^.i), 

(19) 

log = (1 - 7) log S£;t_i + (7 + <i>v)g{et-i) - (t)v9{et-2), 

where wt = yt — yt~i and g{et) is given in (15), and we have assumed Gaus- 
sian innovations so that E[|ef|] = ^Jlj-ix. Similarly, we estimate the five pa- 
rameters a,<j)s,'y,4'v and 9 by maximizing the truncated normal likelihood 
as mentioned in Section 3.2.1. Now, equipped with the ARIMA(1, 1, 1)- 
EGARCH(2, 1) model in (19), the /i-step ahead forecasts for the scale pa- 
rameter •s^.j_|_^|^ can be easily obtained [Tsay (2005)]. Consequently, the h- 

step ahead forecasts can be expressed as a function of {s'^.-i._^j^f}j=i, 

which is analogous to (12) except that the expression is much more compli- 
cated and, in practice, one would simply iterate the forecasts. The /i-step 
ahead density forecasts can then be obtained using (7). 



4. Forecast evaluations. 



4.1. Benchmark models. In this section we apply the approaches of den- 
sity forecasts in Section 3 to forecast normalized aggregated wind power in 
Ireland. To evaluate the forecast performances of our approaches, we com- 
pare the results with four simple benchmarks. The first two benchmarks are 
the persistence (random walk) forecast and the constant forecast, which are 
both obtained as truncated normal distributions in (7). For the persistence 

forecast, we estimate the /i-step ahead location parameter it+h\t and scale 
2 

t+h\t 



parameter s?,,,, using the latest observations, that is. 



(20) £t+h\t = yt, Sf+,,|t = — 

for t > N. We find that taking = 48, that is, using data in the past 12 
hours, gives an appropriate estimate for Sj_|_^|^. 

For the constant forecast, we estimate the constant location parameter 
^t+h\t and scale parameter s^j^^^u using data in the whole training set. They 
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are given by the sample mean and the sample variance of the 11,008 obser- 
vations in the training set, so that 

,pll,008 X^^^'OOSf 

We have also considered generating the persistence and constant forecasts 
using the first approach as described in Section 3.1. However, our results 
show that the second approach gives a better benchmark in terms of forecast 
performance. 

On the other hand, the third and the fourth benchmarks are obtained by 
estimating empirical densities from the data. The third benchmark is the 
climatology forecast, in which an empirical unconditional density is fitted 
using data in the whole training set. The density has been shown in Figure 
7 previously. The fourth benchmark is the empirical conditional density 
forecast. To be in line with the use of exponential smoothing to estimate the 
location and scale parameters in Section 3.2, we consider an exponentially 
weighted moving average (EWMA) of a set of empirical conditional densities. 
Due to computational efficiency as well as reliability of density estimations, 
at each time t we essentially consider the EWMA of 14 empirical conditional 
densities (7cmp({Aj}), where each of them is fitted using observations in the 
past j days with j = 1, 2, . . . , 14 and {A^} = {yt-9%j+i,yt-9%j+2, . . . , yt} is the 
set of (96 X j) latest observations used to fit the empirical density. Up to 
an appropriate normalization constant, the /i-step ahead EWMA empirical 
conditional density forecast is given by 

14 

(22) ft+h\t{y) oc A(l - A)^-igemp({A^}) 

so that for any fixed forecast origin t, the /i-step ahead density forecasts 
are identical for all /i > 1. The smoothing parameter in (22) is estimated to 
be A = 0.1988, which is obtained by maximizing the log likelihood, that is, 
^ log /f_|_i|t(A; yt+i)i using the data in the training set only. It is possible to 
estimate a smoothing parameter for each forecast horizon h. However, the 
improvements are not significant and, thus, we simply keep using A = 0.1988 
for all horizons. Figure 13 shows the exponential decrease of the weights 
being assigned to different empirical densities gemp({A^}). 

In summary, we consider the following 4 benchmarks and 4 approaches of 
generating multi-step density forecasts, and compare their forecast perfor- 
mances from 15 minutes up to 24 hours ahead: 

1. Persistence forecast [TN] 

2. Constant forecast [TN] 

3. Climatology forecast [Empirical density] 
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Number of days 



Fig. 13. The exponential decrease of the weights A(l — X)^~^ assigned to the empirical 
conditional densities (;omp({Aj }) fitted with j days of latest observations, where X — 0.1988 
is obtained by maximizing the likelihood using data in the training set. The EWMA em- 
pirical conditional density forecasts are obtained as the weighted average o/5romp({Aj}). 



4. EWMA conditional density forecast [Empirical density] 

5. The ARIMA(2,1,3) model [LT] 

6. The ARIMA(4,1,3)-GARCH(1,1) model [LT] 

7. The ETS(^,A^,A^|EC) method [TN] 

8. The Y.TsIa,N,N\EC)~{A,N,N\EC) method [TN], 

where [LT] stands for logistic transformation and [TN] stands for truncated 
normal distribution, so as to remind us how the densities are generated. 

4.2. Point forecasts. First, let us evaluate the point forecasts generated 
by different approaches. We consider the expected values of the density 
forecasts as the optimal point forecasts. Given a forecast density, we can 
obtain the expected value directly by numerical integration. In particular, 
for forecast densities in the form of truncated normal distributions, one may 
easily write down the expected value as 

^ — (■t+h\t\ f —^t+h\t 



ff f l-it+H\t \ ( 

yt+h\t = k+h\t - k->rh\t I I 95 1 — ] ~ i 

(23) 



St+h\t 



cDl ^-^t+h\t \_^(-^t+h\t 



t+h\t J \ ^t^h\t 
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where (-t+h\t ^t+h\t location and scale parameters of the truncated 

normal distribution in (7). Note that due to the truncation, the distribution 
may not be symmetric and so the expected value is in general different 
from the location parameter, that is, yt+h\t ^t+h\t- Iii fact, referring to (5), 

h+h\t = PG\h+i\t, h+h-i\u yi,---,yt) is obtained according to a Gaus- 
sian model G, which may not give the true conditional mean yt+h\t of the 
data, and may even be negative. Since the final density ft+h\t is only ob- 
tained when an appropriate function D is chosen, we see that D transforms 
the conditional mean from it+h\t Gaussian data to the optimal forecast 
yt+h\t for our data. This is analogous to calculating optimal point forecasts 
when the loss function is asymmetric [Christoffersen and Diebold (1997), 
Patton and Timmermann (2007)]. Since the normalized aggregated wind 
power is bounded within [0, 1] , the loss function is always asymmetric unless 
the conditional mean is it+hlt = 0-5. When the conditional mean is not the 
optimal forecast, an additional term is added to compensate for the asym- 
metric loss. Christoffersen and Diebold (1997) suggest an approximation to 
calculate the optimal forecast for conditionally Gaussian data by assum- 
ing ilt+hlt = G{nt+h\t,o-t+h\t)^ where f^t+h\t,(^t+h\t conditional mean 
and conditional variance. Their method involves expanding G into a Taylor 
series. 

To evaluate the performances of different forecasting approaches, we cal- 
culate /i-step ahead point forecasts for each of the 5504 values in the testing 
set, where 1 < h < 96, that is, from 15 minutes up to 24 hours ahead. For 
each forecast horizon h, we calculate the mean absolute error (MAE) and 
the root mean squared error (RMSE) of the point forecasts, where the mean 
is taken over the 5504 /i-step ahead forecasts in the testing set. 

Figures 14 and 15 show the results of point forecasts under MAE and 
RMSE respectively. The rankings of different approaches are similar un- 
der either MAE or RMSE, except for the ETS{A,N,N\EC)-{A,N,N\EC) 
method which performs relatively better under MAE than RMSE. It per- 
forms the best under MAE for long forecast horizons beyond 14 hours. 
On the other hand, the two ARIMA-GARCH models outperform all other 
approaches for short forecast horizons within 12 hours, and are almost as 
good as the ETS(^, iV, iV|^;C)-(^, A^, |EC) method for horizons beyond 
12 hours. 

Interestingly, the ARIMA(2,1,3) model is performing almost identically 
to the ARIMA(4, 1, 3)-GARCII(l, 1) model. This phenomenon is in contrast 
with that for the ETS methods, where smoothing both the location and scale 
parameters do perform much better. It seems that including the dynamics 
of the conditional variance in the modeling of the logistic transformed wind 
power zt cannot improve the point forecasts under MAE or RMSE. These 
may be explained by Figure 3 which shows a significantly changing variance 
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Fig. 14. Mean absolute error (MAE) of point forecasts generated by different approaches 
for forecast horizons from 15 minutes to 24 hours ahead. The ARIMA-GARCH models 
on logistic transformed data perform best for short horizons less than 12 hours whereas 
the ETS{A, N, N\EC)-{A, N, N\EC) method with truncated normal distribution is best for 
horizons greater than 12 hours. 




Fig. 15. Root mean squared error (RMSE) of point forecasts generated by different ap- 
proaches for forecast horizons from 15 minutes to 24 hours ahead. Results are similar to 
those under MAE. 
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in the original wind power data y^, and by Figure 11 which shows a fairly 
constant variance for zt- We will further investigate this issue in the eval- 
uation of density forecasts using the probability integral transform (PIT), 
where we see that the conditional variance models are indeed capturing the 
changes in volatility better and thus generate more reliable density forecasts. 

As discussed in Section 1 , one may argue that spatiotemporal information 
among individual wind farms should be deployed to forecast aggregated wind 
power. To show that it is indeed better to forecast the aggregated power as 
a univariate time series, we consider a simple multiple time series approach. 
We obtain the best linear unbiased predictor (BLUP) of wind power gener- 
ation at a single wind farm using observations in the neighborhood, where 
the predictor is the best in the sense that it minimizes mean square errors. 
In other words, it is simply the kriging predictor which is widely applied 
in spatial statistics [Cressie (1993), Stein (1999)]. It can be easily extended 
to deal with spatiotemporal data [Gneiting, Genton and Guttorp (2007)], 
and more details could be found in Lau (2010). Computing the BLUP relies 
on the knowledge of the covariances of the process between different sites. 
In the context of spatiotemporal data, we obtain the BLUP by calculating 
the empirical covariances among the wind power at different spatial as well 
as temporal lags.® We then substitute the empirical covariances into the 
formula of BLUP. We apply this method and obtain 1, 6, 12 and 24 hours 
ahead point forecasts for the power generated at each individual wind farm, 
aggregate all power and normalize the result by dividing by 792.355 MW. 
We compute the RMSE of these aggregated forecasts, and find that aggre- 
gating individual forecasts cannot beat the performances of our approaches 
in Section 3. The results are displayed in Table 1. Of course, one may expect 
that more sophisticated spatiotemporal models may be able to outperform 
our methods here, but this will be of more interest to individual power gen- 
eration instead of aggregated ones as discussed in this paper. 

4.3. Density forecasts. For the density forecasts, we use the continuous 
ranked probability score (CRPS) to rank the performances. Gneiting and 
Raftery (2007) discussed the properties of CRPS extensively, showing that it 
is a strictly proper score and a lower score always indicates a better density 
forecast. CRPS has become one of the popular tools for density forecast 
evaluations, especially for ensemble forecasts in meteorology. We have also 
analyzed the performances of density forecasts using other common metrics 
such as the negative log likelihood (NLL) scores. However, we advocate the 



*One needs to decide the number of temporal lags to be included in calculating the 
BLUP. In our case of empirical covariances, we find that including temporal lags within 
the past hour is generally the best. Forecast performances deteriorate when one considers 
too many temporal lags. 
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Table 1 

Summary of point forecast performances of different approaches under RMSE. The bold 
numbers indicate the best approach at that forecast horizon 





1 hour 


6 hours 


12 hours 


24 hours 


Persistence forecast 


0.036 


0.138 


0.191 


0.229 


Constant forecast 


0.263 


0.263 


0.263 


0.263 


Climatology forecast 


0.285 


0.285 


0.285 


0.285 


EWMA conditional density 


0.177 


0.192 


0.203 


0.211 


ARIMA(2,1,3) 


0.032 


0.118 


0.177 


0.207 


ARIMA(4, 1, 3)-GARCH(l, 1) 


0.031 


0.117 


0.177 


0.209 


ETS(yl,iV,iV|£'C7) 


0.032 


0.126 


0.193 


0.230 


ETS(yl, N, N\EC)-[A, N, N\EC) 


0.034 


0.126 


0.176 


0.215 


BLUP (Multiple time series approach) 


0.037 


0.123 


0.188 


0.229 



use of CRPS for ranking different approaches since CRPS is more robust 
than the NLL scores, while the latter is always severely affected by a few 
extreme outliers [Gneiting et al. (2005)]. One may need to calculate the 
trimmed mean of the NLL scores in order to resolve this problem [Weigend 
and Shi (2000)]. Also, CRPS assesses both the calibration and the sharpness 
of the density forecasts, while the NLL scores assesses sharpness only. 

Similar to evaluating point forecasts, we generate /i-step ahead density 
forecasts for each of the 5504 values in the testing set where 1 < /i < 96. For 
each /i-step ahead density forecast ft+h\ti let Fi_^m be the corresponding 
cumulative distribution function. The CRPS is computed as 

(24) CRPS = f\Ft+h\t{y) - l{y - yt+h)f dy , 

Jo 

where l(-) is the indicator function which is equal to one when the argument 
is positive. Again, the mean CRPS is taken over the 5504 /i-step ahead 
density forecasts in the testing set. 

Figure 16 shows the performances of density forecasts under mean CRPS. 
The rankings are similar to those under MAE and RMSE in point fore- 
casts. The two ARIMA-GARCH models outperform all other approaches 
for all forecast horizons. Table 2 summarizes the main results. Again, the 
performances of the ARIMA(2,1,3) model are very similar to that of the 
ARIMA(4, 1, 3)-GARCH(l, 1) model and, in contrast, the ETS(^, A^, N\EC)- 
{A,N,N\EC) method is significantly better than the ETsIa,N,N\EC) 
method. To investigate the value of including the dynamics of conditional 
variances, we consider the probability integral transform (PIT). For one-step 
ahead density forecasts ft+i\t, the PIT values are given by 

rvt+i 

(25) z{yt+i)= / ft+i\t{y)dy- 

Jo 
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Table 2 

Summary of density forecast performances of different approaches under CRPS. The bold 
numbers indicate the best approach at that forecast horizon 





1 hour 


6 hours 


12 hoiurs 


24 hours 


Persistence forecast 


0.019 


0.077 


0.111 


0.137 


Constant forecast 


0.159 


0.159 


0.159 


0.159 


Climatology forecast 


0.175 


0.175 


0.175 


0.175 


EWMA conditional density 


0.098 


0.111 


0.120 


0.127 


ARIMA(2,1,3) 


0.017 


0.065 


0.100 


0.119 


ARIMA(4, 1, 3)-GARCH(l, 1) 


0.016 


0.063 


0.099 


0.120 


ETS(yl,iV,iV|£'C) 


0.017 


0.068 


0.109 


0.129 


ETS(yl, N,N\EC)-{A, N, N\EC) 


0.017 


0.069 


0.100 


0.124 



Diebold, Gunther and Tay (1998) show that the series of PIT values z are 
i.i.d. uniform if coincides with the true underlying density from which 

Ht+i is generated. For each forecasting approach, we calculate the percentage 
of PIT values below the 5th, 50th and 95th quantiles of the C/[0, 1] distribu- 
tion, that is, the percentage of PIT values smaller than 0.05, 0.5 and 0.95 
respectively. We denote them by ^5,^50 and P95, and calculate the devia- 
tions of the percentages (P5 — 5), (P50 — 50) and (P95 — 95). Figure 17 shows 
the deviations, where we only focus on the two ETS methods and the two 

0.18 
0.16 
0.14 



0.12 

CO 

C 0.1 
O 

i 0.08 



0.06 
0.04 
0.02 



'12 3 4 5 6 8 10 12 14 16 18 20 22 24 
Forecast horizon (Hour) 

Fig. 16. Mean continuous ranked probability score (CRPS) of density forecasts generated 
by different approaches for forecast horizons from 15 minutes to 24 hours ahead. Rankings 
are similar to those under MAE and RMSE in point forecasts. 
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Fig. 17. We calculate the percentages Pz,, P50 and P95 of PIT values smaller than 0.05, 
0.5 and 0.95 respectively, and calculate the deviations (P5 — 5) , (P50 — 50) and (P95 — 95). 
The 'Er£S{A,N,N\EC)-(A,N,N\EC) method and the ARIMA(4, 1,3)-GARCH(1, 1) 
model indeed generate better calibrated density forecasts. The overall calibration of the 
ETS{A, N, N\EC )-(A, N, N\EC ) method is the best, indicating that it provides the most 
reliable descriptions of the changing volatility over time. Note that a positive slope implies 
a ddnsity forecast which is over-conservative, while a negative slope implies the opposite. 

ARIMA-GARCH models. We see that the ETS{A, N, N\EC)~{A, N, N\EC) 
method and the ARIMA(4, 1, 3)-GARCH(l, 1) model indeed generate den- 
sity forecasts which are better calibrated. In particular, the overall calibra- 
tion of the ETS{A,N,N\EC)-{A,N,N\EC) method is the best, indicating 
that it provides the most reliable descriptions of the changing volatility over 
time. Note that a positive slope in Figure 17 implies a density forecast which 
is over-conservative and has a large spread, while a negative slope implies the 
opposite. Thus, for one-step ahead forecasts, the ARIMA-GARCH models 
are over-conservative, while the ETS methods are over-confident. 

Figure 17 only reflects information on the marginal distributions of the 
PIT values. Stein (2009) suggests that it is also valuable to evaluate the 
distributions conditioned on volatile periods. It is particularly important to 
capture the variance dynamics during times of large volatilities, since for 
most of the times one does not want to underestimate the risk by propos- 
ing an over-confident density forecast. Underestimating large risks usually 
leads to a more disastrous outcome than overestimating small risks. Fol- 
lowing Stein (2009), we compare the ability of the approaches in capturing 
volatility dynamics during the largest 10% of variance. To estimate the vari- 
ance of the data in the testing set, we directly adopt the persistence forecast 
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Fig. 18. Estimated variance of data in the testing set using the persistence forecast 
in (20), which essentially gives the 12-hour moving average of realized variance. 
Clearly, the variance changes with time and the largest values mostly occur in early De- 
cember. 

™ (20), which essentiahy gives the 12-hour moving average of reahzed 
variance. Figure 18 shows the changing variance, where the largest values 
mostly occur in early December. The times corresponding to the largest 
10% of variance are selected and we compare the distribution of z[yt+i) at 
those times. The PIT diagrams are shown in Figure 19. It demonstrates that 
the ARIMA-GARCH model indeed gives better calibrated one-step ahead 
density forecasts than the ARIMA model during volatile periods. The dif- 
ferences between the two ETS methods are even more significant, where 
the ETS(^, A^, |£'C) method gives over-confident density forecasts that 
underestimate the spread. 

5. Conclusions and discussions. In this paper we study two approaches 
for generating multi-step density forecasts for bounded non-Gaussian data, 
and we apply our methods to forecast wind power generation in Ireland. 
In the first approach, we demonstrate that the logistic transformation is a 
good method to normalize wind power data which are otherwise highly non- 
Gaussian and nonstationary. We fit ARIMA-GARCH models with Gaus- 
sian innovations for the logistic transformed data, and out-of-sample fore- 
cast evaluations show that they generate both superior point and density 
forecasts for all horizons from 15 minutes up to 24 hours ahead. A second ap- 
proach is to assume that the /i-step ahead conditional densities are described 
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Fig. 19. Histograms of PIT values conditioned on the largest 10% of estimated variance, 
where the one-step ahead density forecasts are generated using the ARIMA(2, 1, 3) model 
(top left), the ARIMA(4,1,3)-GARCH(1,1) model (top right), the KT'&^A, N , N\EC ) 
method (bottom left) and the KTS{A,N,N\EC )-(A,N,N\EC ) method (bottom right). The 
dotted lines correspond to 2 standard deviations from the uniform density. 



by a parametric function D with a location parameter i and scale parameter 
s^, namely, the conditional mean and the conditional variance of yt that are 
generated by an appropriate Gaussian model G. Results show that choosing 
D as the truncated normal distribution is appropriate for aggregated wind 
power data, and in this case I and are the mean and variance of the 
original normal distribution respectively. We apply exponential smoothing 
methods to generate /i-step ahead forecasts for the location and scale param- 
eters. Since the underlying models of the exponential smoothing methods 
are Gaussian, we are able to obtain multi-step forecasts by simple iterations 
and generate forecast densities as truncated normal distributions. 

Although the approach using exponential smoothing methods with trun- 
cated normal distributions cannot beat the approach considering logistic 
transformed data, they are still a useful alternative to produce good density 
forecasts due to several reasons. First, forecast performances of the exponen- 
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tial smoothing methods are more robust under different lengths of training 
data, especially when the size of the training set is relatively small and the 
estimation of the ARIMA-GARCH models may not be reliable to extrapo- 
late into the testing set. This has been demonstrated in our data, where we 
take 40% of the data as the training set and the remaining as the testing set. 
In such a case, the ETS(A, N, N\EC)-{A, N, N\EC) method performs better 
than the approach with logistic transformed data [Lau (2010)]. Second, in 
the first approach using ARIMA-GARCH models, we have to select the best 
model using BIG whenever we consider an updated training set. This is not 
necessary for the exponential smoothing methods. Third, an advantage of 
forecasting by exponential smoothing methods is that it is computationally 
more efficient to calculate point forecasts due to the closed form of density 
function that we have chosen, namely, the truncated normal distribution D. 
On the other hand, in the first approach, we have to transform the Gaussian 
densities and calculate the expected value of the transformed densities by 
numerical integrations, which require much more computational power. The 
second and third points are critical since, in practice, many forecasting prob- 
lems require frequent online updating. Finally, the second approach allows 
us to choose a parametric function D for the forecast densities, which gives 
us more flexibility and one may generate improved density forecasts by test- 
ing various possible choices of D. This advantage is particularly important 
when there are no obvious transformations to normalize the data, and when 
there is evidence that supports simple parametric forecast densities. 

In summary, we have developed a general approach of generating multi- 
step density forecasts for non-Gaussian data. In particular, we have ap- 
plied our approaches to generate multi-step density forecasts for aggregated 
wind power data, which would be economically valuable to power compa- 
nies, national grids and wind farm operators. It would be interesting and 
challenging to propose modified methods based on our current approaches, 
so that reliable density forecasts for individual wind power generation could 
be obtained. Individual wind power time series are interesting since they are 
highly nonlinear. Sudden jumps from maximum capacity to zero may occur 
due to gusts of winds, and there may be long chains of zero values because 
of low wind speeds or maintenance of turbines. Characteristics of individual 
wind power densities include a positive probability mass at zero as well as 
a highly right-skewed distribution, and it would be challenging to generate 
multi-step density forecasts for individual wind power data. Another impor- 
tant area of future research is to develop spatiotemporal models to generate 
density forecasts for a portfolio of wind farms at different locations. Re- 
cent developments in this area include Hering and Genton (2010). Some 
possible approaches include the process-convolution method developed and 
studied by Higdon (1998), which has been applied to the modeling of ocean 
temperatures and ozone concentrations. Another possible approach is the 
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use of latent Gaussian processes. Those approaches have been studied by 
Sanso and Guenni (1999) who consider the power truncated normal (PTN) 
model, and by Berrocal, Raftery and Gneiting (2008) who consider a mod- 
ified version of the PTN model called the two-stage model. Spatiotemporal 
models will be important to wind farm investors to identify potential sites 
for new farms. It would also be of great importance to the national grid sys- 
tems where a large portfolio of wind farms are connected, and sophisticated 
spatiotemporal models may be constructed to improve density forecasts for 
aggregated wind power by exploring the correlations of power generations 
between neighboring wind farms. 
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